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Abstract: Shape restricted regressions, including isotonic regression and con- 
cave regression as special cases, are studied using priors on Bernstein polynomi- 
als and Markov chain Monte Carlo methods. These priors have large supports, 
select only smooth functions, can easily incorporate geometric information into 
the prior, and can be generated without computational difficulty. Algorithms 
generating priors and posteriors are proposed, and simulation studies are con- 
ducted to illustrate the performance of this approach. Comparisons with the 
density-regression method of Dette et al. (2006) are included. 



1. Introduction 

Estimation of a regression function with shape restriction is of considerable interest 
in many practical applications. Typical examples include the study of dose response 
experiments in medicine and the study of utility functions, product functions, profit 
functions and cost functions in economics, among others. Starting from the classic 
works of Brunk [3| and Hildreth [13] , there exists a large literature on the problem 
of estimating monotone, concave or convex regression functions. Because some of 
these estimates are not smooth, much effort has been devoted to the search of 
a simple, smooth and efficient estimate of a shape restricted regression function. 
Major approaches to this problem include the projection methods for constrained 



smoothing, which are discussed in Mammen et al. 2^ and contain smoothing splines 



methods and kernel and local polynomial methods and others as special cases, the 



isotonic regression approach studied by Mukerjee [2 5| . Mammen [22|,|23] and others, 
the tilting method proposed by Hall and Hua ng llq| . and the density-regression 
method proposed by Dette, Neumeyer and Pilz We note that both of the last 
two methods enjoy the same level of smoothness as the unconstrained counterpart 
and are applicable to general smoothing methods. Besides, the density-regression 
method is particularly computationally efficient and has a wide applicability. In 
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fact, the density-regression method was used to provide an efficient and smooth 
convex estimate for convex regression by Birke and Dette 0] . 

This paper studies a nonparamctric Bayes approach to shape restricted regres- 
sion where the prior is introduced by Bernstein polynomials. This prior features 
the properties that it has a large support, selects only smooth functions, can eas- 
ily incorporate geometric information into the prior, and can be generated without 
computational difficulty. We note that Lavine and Mockus [l8| also discussed Bayes 
methods for isotonic regression, but the prior they use is a Dirichlet process, whose 
sample paths are step functions. In addition to the above desirable properties, 
our approach can be applied to quite general shape restricted statistical inference 
problems, although we consider mainly isotonic regression and concave (convex) re- 
gression in this paper. To facilitate the discussion, we first introduce some notations 
as follows. 



For integers < i < n, let <Pi, n (t) = Cft^l - t) n ~\ where Cf = nl/(i\(n - 
{fi,n i — 0, . . . , n} is called the Bernstein basis for polynomials of order n. Let 
B n = W l+1 and B — U^Li({ n } x B n ). Let ir be a probability measure on B. For 
t > 0, we define F : B x [0, r] — ► M 1 by 



where (n, &o,nj • • • , b n , n ) G B and t G [0, r]. We also denote (1.1) by Fb n (t) if b n = 
(6o,nj ■ ■ • j^n.n)- The probability measure ir is called a Bernstein prior, and F is 
called the random Bernstein polynomial for tt. It is a stochastic process on [0, r] 
with smooth sample paths. Important references for Bernstein polynomials include 
Lorentz [20| and Altomare and Campiti 0], among others. It is well-known that 
Bernstein polynomials are popular in curve design (Prautzsch et al. |28|). 

Bernstein basis have played important roles in nonparametric curve estimation 
and in Bayesian statistical theory. Good examples of the former include Tenbusch 
[3ll ] and some of the references therein. For the latter, we note that Beta density 
P(x;a,b) — x a ~ 1 (l — ir) b ~ 1 /i?(a, b) is itself a Bernstein polynomial and mixtures 
of Beta densities of the kind Y^j=i Wj(3{x;aj,bj), where Wj > are random with 
Y^j=i w j — 1) were used to introduce priors that only select smooth density func- 
tions on [0,1] ; see Dalai and Hall [§}, Diaconis and Ylvisaker and Mallik and 
Gelfand [2l[, and references therein. We also note that Petrone [26| and Petrone 
and Wasserman j2?| studied priors on the set of distribution functions on [0, 1] that 
are specified by X)"=o ^ 1 (*/ n ) < / 3 j,n(^) with F being a Dirichlet process; this prior was 
referred to as a Bernstein-Dirichlet prior. 

For a continuous function F on [0, r], X)™=o ^(* T / n )¥ , i,n(^/ T ) i s called the n-th 
order Bernstein polynomial of F on [0,r]. We will see in Section 2 that much of 
the geometry of F is preserved by its Bernstein polynomials and very much of 
the geometry of a Bernstein polynomial can be read off from its coefficients. This 
together with Bernstein- Weierstrass approximation theorem suggests the possibility 
of a Bernstein prior on a space of functions with large enough support and specific 
geometric properties. These ideas were developed in Chang et al. for Bayesian 
inference of a convex cumulative hazard with right censored data. 

The purpose of this paper is to indicate that the above ideas are also useful in 
the estimation of shape restricted regressions. The regression model we consider in 
this paper assumes that given F^ satisfying certain shape restriction, 




(1.1) 
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where Xk are design points, Yjj~ are response variables and tjk are errors. 

We will investigate isotonic regression and concave (convex) regression in some 
detail. In particular, we will examine the numerical performance of this approach 
and offer comparison with the newly-developed density-regression method of Dcttc 



et al. 11] in simulation studies. We also indicate without elaboration that this 



approach can also be used to study regression function that is smooth and unimodal 
or that is smooth and is constantly zero initially, then increasing for a while, and 
finally decreasing. We note that the latter can be used to model the time course 
expression level of a virus gene, as discussed in Chang et al. @; a virus gene typically 
starts to express after it gets into a cell and its expression level is zero initially, 
increases for a while and then decreases. 

That it is very easy both to generate the prior and the posterior for inference 
with Bernstein prior is an important merit of this approach. Because the prior 
is defined on the union of subspaces of different dimension, we adapt Metropolis- 
Hastings reversible jump algorithm (MHRA) (Green [HI) to calculate the posterior 
distributions in this paper. 

This paper is organized as follows. Section 2 introduces the model, provides 
statements that exemplify the relationship between the shape of the graph of (1.1) 
and its coefficients bo, . . . , b n , and gives conditions under which the prior has full 
support with desired geometric peoperties. Section 3 illustrates the use of Bernstein 
priors in conducting Bayesian inference; in particular, suitable Bernstein priors are 
introduced for isotonic regression and unimodal concave (convex) regression, and 
Markov chain Monte Carlo approaches to generate posterior are proposed. Section 
4 provides simulation studies to compare our methods with the density-regression 
method. Section 5 is a discussion on possible extensions. 



2. The Bernstein priors 

2.1. Bernstein polynomial geometry 

Let F a (t) = X^r=o a ifi,n{t/T). This subsection presents a list of statements concern- 
ing the relationship between the shape of F a and its coefficients ao, ■ ■ ■ , a n . This list 
extends that in Chang et al. [7[ and is by no means complete; similar statements 
can be made for a monotone and convex or a monotone and sigmoidal function, for 
example. They are useful in taking into account the geometric prior information for 
regression analysis. 

Proposition 1. (i) (Monotone) If ao < a\ < ■ ■ ■ < a n , then F^(t) > for every 

te[Q,T]. 

(ii) (Unimodal Concave) Letn > 2. If a\—ao > 0, a n — a n _i < 0, and o,i + i+a;_i < 
2a ly for every i = l,...,n-l, then i^(0) > 0, F' a (r) < 0, and F^(t) < for 
every t € [0, r]. 

(Hi) (Unimodal) Let n > 3. If ao = a>i = • • ■ = < a h+i < a/1+2 < • ■ ■ < a; 2 
and ai 2 > a; 2 +i > • • • > a; 3 > aj 3 +i = • • • = a n for some < l\ < h < 
h < n, then there exists s 6 (0, r) such that s is the unique maximum point 
of F a and F a is strictly increasing on [0, s] and strictly decreasing on [s,r]. 
Furthermore, if l\ > 0, then Fa l \o) = for i = 1, . . . ,l\, and if I3 < n — 1, 
then F { a i] {T) = for i = 1, . . . ,n - l 3 - 1. 

In this paper, derivatives at and r are meant to be one-sided. We note that, 
in Proposition 1, (ii) provides a sufficient condition under which F a is a unimodal 
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concave function and (iii) provides a sufficient condition under which F a is a uni- 
modal smooth function whose function and derivative values at and r may be 
prescribed to be 0. Let F a satisfy (iii) with r = f and l\ > 1 , and let F be defined 
by F(t) = F a (t — ti/t — rx)J( Tl)T i(t) for some t\ £ (0, r), then F is a non-negative 
smooth function on [0, r] and it is zero initially for a while, then increases, and 
finally decreases. As we mentioned earlier, functions with this kind of shape restric- 
tion are useful in the study of time course expression profile of a virus gene. We note 
that the results presented here are for concave regressions or unimodal regressions 
and similar results hold for convex regressions. 

Proof. Without loss of generality, we assume r = 1. Noting that the derivatives 
Kit) = nY%=o(a i+ i - ai)<p it n-i(t) and F^(t) = n(n - 1) Y<i=o( a i+2 ~ 2a i+i + 
o-i)fi,n-2(t), we obtain (i) and (ii) immediately. We now prove (iii) for the case 
h = 1 and I3 = n — 1; the proofs for other cases are similar and hence omitted. 

Let <p(x) = Y^Ci~Hoi+i - <H)x\ then F' a (t) = n(l - i)"-*^^). Using 
a = cii < a 2 < ■ ■ ■ < ai and a; > a; +1 > • • ■ > a n _i > a n , we know tp is 
a polynomial whose number of sign changes is exactly one. This together with 
Descartes' sign rule (Anderson et al. [l|) implies that tp has at most one root in 
(0, 00), and hence, F' a has at most one root in (0, 1). 

Using ciq — a\ and 0,2 — a\ > 0, we know F' a (e) > for e being positive and 
small enough. This combined with ^(1) = n(a n — a n -i) < shows that F' a has 
at least one root in (0,1). Thus, F' a has exactly one root s in (0,1), and F' a is 
positive on (0, s) and negative on (s, 1]. Therefore, the conclusion of (iii) follows. 
This completes the proof. □ 

The following proposition complements Proposition 1 and provides Bernstein- 
Weierstrass approximations for functions with specific shape restrictions. 

Let l£' = {F a I a e B n , a < 01 < • • ■ < a n }, IrP = {F a \ a e B n , 01 - a > 
0, a n - a n _i < 0, Oi+i + Oj_i < 2aj, for i = 1, . . . , n — 1}, and I n = {F a \ a e 
B n , eto = ai < C12 < ■ ■ ■ < ai, ai > ai + \ > ■ ■ • > a n -i > a n, for some I = 2, . . . ,n — 
1}. Then we have 

Proposition 2. fi) (Monotone) Let T>\ consist of linear combinations of ele- 
ments in In \ with non-negative coefficients. Then the closure ofT>\ in 
uniform norm is 'precisely the set of increasing and continuous functions on 
[0,r]. 

(ii) (Unimodal Concave) Let T>2 consist of linear combinations of elements in 
[Jn=2 In \ with non-negative coefficients. Let S denote the set of all contin- 
uously differentiable real-valued functions F defined on [0, r] with F'(0) > 0, 
F'(t) < 0, and F' decreasing. For two continuously differentiable functions f 
and g, define d(f,g) = \\f - g\\oo + \\f - g'\\oo, where \\ ■ is the sup-norm 
for functions on [0,r]. Then the closure ofT>2, under d, is S. 
(iii) (Unimodal) Let T>^ = \S^ = ^In ■ Let S denote the set of all continuously 
differentiable real-valued functions F defined on [0, r] satisfying the properties 
that F'(0) = and that there exists s £ [0, r] such that F'(s) = Q,F'(x) > 
for x £ [0, s], and F'(x) < for x £ [s, r]. For two continuously differentiable 
functions f and g, define d(f,g) = \\f - gW^ + \\f - g'\\oo, where || ■ ||oo is 
the sup-norm for functions on [0, t]. Then the closure ofT>3, under d, is S. 

Proof. We give the proofs for (i) and (ii) , and omit the proof for (iii) , because it is 
similar to that for (ii). 
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(i) It is obvious to see that the closure of T>\ is contained in the set of increasing 
and continuous functions. We now prove the converse. Let F be increasing and con- 
tinuous. Taking eij = F(ir/k), for i = 0, 1, . . . , k, we set F(m(£) — Si=o a i<Pi,k{t/T~). 
Then F( k ) is in Z?i. It follows from Bernstein- Weierstrass approximation theorem 
that Fity converges to F uniformly. This completes the proof. 

(ii) It follows from (ii) in Proposition 1 that T>2 C S. Because S is complete 
relative to d, we know X>2 C S. Here T>2 is the closure of T>2- We now prove S C T>2- 

Let F £ S. Let d = F'(0) + l/n 3 , d„_i = F'(t) - 1/n 3 , d t = F'(ir/(n - 1)), 
for i = 1, ... ,n - 2, and i?i,„(*) = di<p ijn -i(t/T). Note that i?i,„(0) > 0, 

Hi,,i(t) < 0. Using Bernstein- Weierstrass Theorem, we know f/i jrl converges to F' 
uniformly. 

Let a = F(0)n/r, a { = a + d Q H h di-x, H 0>n (t) = (r/n) J2™=o a * i PiA t / T )- 

Then H^„(t) = and flo,n(0) = F(0L Thus ff ,n converges uniformly to F 

(See, for example, Theorem 7.17 in Rudin [30(). 

Using the fact that dt is a decreasing sequence with do > 0, rf n _i < 0, and 

a i+2 ~ 2a i+1 + aj = d i+ i — di, 

we know a\ — a > 0, a„ — a n _i < and a,i + 2 + a ; < 2a i+1 . Thus -ffo.n is in £>2- 
This shows that F is in 2?2- This completes the proof. □ 

2.2. Bayesian regression 

We now describe a Bayesian regression model with the prior distribution, on the 
regression functions, introduced by random Bernstein polynomials (1.1). 

Assume that on a probability space (B x R°°, F, V), there are random variables 
{Yjk \j — 1 , . . ■ , rrik] k = 1, . . . , K} satisfying the property that, conditional on 
B = (n,b n ), 

Y jk = F bn (X k ) + ejfc , (2.1) 

with {tjk \j = 1, . . . , to/;,; k = 1, . . • , K} being independent random variables, ejk 
having known density g k for j = 1, . . . , m k , B being the projection from B x K.°° 
to B, F(, n being the function on [0,r] associated with (n,b n ) s B defined in (1.1), 
Xx, . . . , Xfc being constant design points, T being the Borel a— field on B x 
We also assume the marginal distribution of V on B is the prior it. 

Our purpose is to illustrate a Bayesian regression method and the above math- 
ematical formulation is only meant to facilitate an simple formal presentation. In 
fact, V is constructed after the prior and the likelihood are specified. A natural way 
to introduce the prior it is to define 7r(n, b n ) = p(n)TT n (b n ), with X)^Li-P( 71 ) = 1 
and 7r„ a density function on B n ; in fact, 7r„(-) is the conditional density of tt on 
B n and also denoted by tt(- | {n} x B n ). Given B = (n,b n ), the likelihood for the 
data {(Xk,Yjk) | j = 1,. • ■ ,m fc ; fc = 1, . . . , K} is 

K mj. 

nn^(^-^„(A fe )). 

Thus the posterior density v of the parameter (n, b n ) given the data is propor- 
tional to 

II II 9k(Y,k - F bn (X k ))7r n (b n )p(n), 
fc=lj'=l 
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where (n, b n ) £ B. We note that, although we assume gt is known in this paper, 
the method of this paper can be extended to treat the case that gk has certain 
parametric form with priors on the parameters. 

2. 3. Support of Bernstein priors 

The following propositions show that the support of the Bernstein priors can be 
quite large. 

Proposition 3. (Monotone) Let B n ^ = {b n £ B n : F bn £ In^}- Assume p(n) > 
for n — 1,2,..., and the conditional density 7r„(6 ,n, bi,n, ■ ■ ■ , b n , n ) of tt(- | {n} x 
Bn^) has support B^ for infinitely often n. Let F be a given increasing and contin- 
uous function on [0 , r] . Then 7r{(n, b n ) £ {S^= 2 (.{ n } x Bn^) ■ \\F bn - F\\<x < e} > 
for every e > 0. 

Proof. Let = ^2" =0 F(iT/n)ip iyn (t/T). Using Bernstein- Weierstrass Theo- 

rem, we can choose a large n\ so that ||-F( ni ) — -Flloo < e/2,P(ni) > 0, and 
7r(- I {ni} x Bn}) > has support B„} . Combining this with ||F;, n — i^Uoo < 
max i=0 ,..., n {| bi, n - F(ir/n) |} for b n £ B n , we get 

n{{n,b n ) £ B : \\F bn - F\\ x < e} 

> ir{(n l7 b ni ) £ B : \\F bni - F^W^ < |} 

>7r{(m,&„i) e {m} x B« : . max \b hni - F(— )| < J}, 

which is positive. This completes the proof. □ 

Remarks. If we know c\ < F(t) < Co, then it suffices to assume ir n has support 

{(6 ,n, • ■ • , frn,n) £ B^ \ c\ < o ,„, b n ^ n < c } in Proposition 3. Statements similar 
to Proposition 3 can also be made for concave functions. In fact, we have 

(2) (2) 

Proposition 4. (Unimodal Concave) Let B n = {b n £ B n : F bn £ I n }• Assume 
p(n) > for n = 1,2,..., and the conditional density 7T„(6o,rt, &i,n> • • • , b n , n ) of 
7r(- 1 {n} x Bn ' 1 ) has support B n 2 ^ for infinitely often n. Let F be a continuously 
differ entiable real-valued function defined on [0,r] with F'(0) > 0, F'(t) < 0, and 
F> decreasing. Then 7r{(n, b n ) £ Un= 2 (W x B n 2) ) : \\F bn -F\\ O0 + - F'W^ < 
e} > for every e > 0. 

Proposition 5. (Unimodal) Let B n ' — {b n £ B n : F bn £ I n '}. Assume p(n) > 
for n = 1,2,..., and the conditional density 7r„(&o.n, &i,ni • • • > &n,n) o/ 7r(-| {n} x 
B^) has support B^ for infinitely often n. Let F be a continuously differentiable 
real-valued function defined on [0, r] satisfying the properties that F'(0) — and that 
there exists s £ (0, r) such that F'(s) = 0, F'(x) > for x £ [0, s], and F'(x) < for 
x£[s,t}. Thenn{(n,b n ) EUZiiMxB^) : WF^-F^ + WF^-F'W^ < e} > 
/or every e > 0. 

3. Bayesian inference 

Instead of defining priors explicitly, we propose algorithms to generate samples from 
the Bernstein priors that incorporate geometric information. We also propose algo- 
rithms for generating posterior distributions so as to do statistical inference. We con- 
sider only isotonic regression and unimodal concave (convex) regression in the rest 
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of this paper, because we want to compare our method with the density-regression 
method. Algorithms 1, 2 and 3 respectively generate prior for isotonic regression, 
unimodal concave regression and unimodal convex regression. Both Algorithm 4, an 
independent Metropolis algorithm (IMA), and Algorithm 5, a Metropolis-Hastings 
reversible jump algorithm (MHRA), can be used to generate posterior for isotonic 
regression. Algorithms generating posterior for unimodal concave (convex) regres- 
sion can be similarly proposed; they are omitted to make the paper more concise. 

Algorithm 1. (Bernstein isotonic prior) 

Let = e~ a + ae~ a , p(n) = a n e~ a /nl for n = 2, . . . , no — 1, and p(n ) = 
1 — Y^=i P( n )- Let qi be a density so that its support contains F(0). Let q 2 be 
a density so that its support contains F(t). The following steps provide a way to 
sample from an implicitly defined prior distribution. 

1. Generate n ~ p. 

2. Generate ao ~ qi, a n ~ qi. 

3. Let U\, U2, ■ ■ ■ , U n -i be a random sample from Uniform(ao, a n ). Let E/m, 
£7(2), • ■ • ,f7(n-i) be the order statistics of {Ui, U2, ■ ■ ■ , U n -i}- Set a\ — Um, 

a 2 = U( 2 ), ■ ■ ■ 1 O-n-1 = t7( n _i). 

4. (n, ao, . . . , a n ) G B is a sample from the prior. 

Algorithm 2. (Bernstein concave prior) 

Let p(2) = (1 + a)e~ a + a 2 e - Q /2, p(n) = a n e- a /nl for n = 3, . . . , n - 1, and 
p(no) = 1 — X^nl^ 1 Let g be a density with its support containing the mode of 
the regression function. Let (3\ be a lower bound of F(0). Let 02 be a lower bound 
of F(t). The algorithm has the following steps. 

1. Generate n ~ p. 

2. Randomly choose / from {1, 2, . . . , n — 1}. 

3. Generate ai ~ q. 

4. Generate a ~ Uniform(—ai + 2f3\,ai) and a„ ~ Uniform(—ai + 202, ai). 

5. Let < J7 2 < ••• < Ui-i be the order statistics of a random sample from 
Uniform(ao,ai). Denote by c\ < c 2 < • • • < ci the order statistics of {a; — 
U1-1, U1-1 - U1-2, ...,U 2 -Ui, Ui -a }. Then set a ± = a + c ( ,a 2 = a + c ; + 
Cj_i, . . . , a;_i = a + cH hc 2 . 

6. Let V\ < V2 < ■ ■ ■ < V n -i-i be the order statistics of a random sample 
from Uniform(a ni ai). Denote by c[ < c' 2 < ■ ■ ■ < c' n _ l the order statistics 
of {ai - V n -i-\,V n -i-i - V n -i-2, ■ ■ ■ , V 2 - Vi,Vi - a n }. Then set a w = 
at - ci, oj+2 = m - ci - c 2 , . . . , a„_i = ai - c[ c ' n -i-i- 

In the above algorithms, the conditional distributions of 7r(- | {n} x S„) are de- 
fined to be those of (ao, 01, . . . , a„). Although these two algorithms might look a 
little ad hoc, the main idea is to produce random sequence ao, ai, . . . , a n satisfying 
conditions in the propositions in Section 2. It is obvious that ao < ai < • • • < a n 
in Algorithm 1 and that a\ — a > 0, a„ — a„_i < 0, and a i+2 + a^ < 2a i+ i in Al- 
gorithm 2. It follows from Proposition 1 that the support of the prior generated by 
Algorithm 1 (Algorithm 2) contains only monotone (unimodal concave) functions. 
The following Algorithm 3 will be used in the simulation study. 

Algorithm 3. (Bernstein convex prior) 

Let p(2) = (1 + a)e-° + a 2 e - a /2, p(n) = a n e - a /nl for n = 3, . . . , n - 1, 
and p(uq) = 1 — X)"=2 1 ?'( n )- 9 be a density with its support containing the 
minimum value of the regression function. Let (3\ be a upper bound of F(0), and 
P2 be a upper bound of F(t). The algorithm has the following steps. 
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1. Generate n ~ p. 

2. Randomly choose Z from {1, 2, . . . , n — 1}. 

3. Generate a; ~ q. 

4. Generate ao ~ Uniform(ai, 2/3\ — oj) and a„ ~ Uniform(ai, 2f3 2 — ai). 

5. Let ZTi < ^2 < • • ■ < C7"i_i be the order statistics of a random sample from 
Uniform(ai, ao). Denote by ci < C2 < • ■ ■ < ci the order statistics of {ao — 
Ui-i, Ui-i - Ui-2, . . . ,U 2 -Ui,U 1 - ai}. Then set a\ — a - c u a 2 = a - a - 
ci-i, • • • , a ; _i = a — ci c 2 . 

6. Let Vi < Vi < ■ ■ ■ < Vn-i-i be the order statistics of a random sample 
from Uniform(ai,a n ). Denote by c[ < c' 2 < ■ ■ ■ < c' n _ t the order statistics 
of {a n - V n -i-\,Vn-i-\ - V n -i-2, ■ ■ ■ , V 2 - Vi,Vi - ai}. Then set a i+ i = 
oj + c' x , a; +2 = a; + c'i + c' 2 , . . . , a n _i = a; + c'j H h c' n _ l _ 1 . 

Algorithm 4. (IMA for the posterior in isotonic regression) 

This algorithm uses the independent Metropolis approach (see, for example, 
Robert and Casella [29J, page 276) to calculate the posterior distribution v of (n, b). 
Let x = (n, ao, . . . , a„) be generated by the prior. We describe the transition from 
the current state x^ = (n', a' ,..., a' n ,) to a new point by 

!.,, , . L v(x)Tr n '(a' ,...,a',)p(n r ) 
x, with prob. mm < 1, f , . , 

\ ^(x( t ))7r„(a ,...,a„)p(n) 
ceO, o.w. 

The posterior distribution of (n, a) in turn produces the posterior distribution of 
F a , and the Bayes estimate we consider is the mean of F a . 

Algorithm 5. (MHRA for the posterior in isotonic regression) 

Let B( n ) = {(n, ao, ■ ■ ■ , a n ) | (ao, . . . , a„) £ }• Let £ (t) = (n, a , . . . , a„) £ 
B( n ) be the current state. We describe the transition from x^' £ B( n ) to a new 
point by the following algorithms. 

Randomly select one of three types of moves H, H + , or Here if is a tran- 
sition of element in Bi n \, H + a transition of element from -B( n ) to B(„+i), and 
a transition of element from Br n \ to Bt n _i\, respectively. Denote by P^, 
P^j + and Pg- the probabilities of selecting the three different types of moves 
H, H + and H~ when the current state of the Markov chain is in Br n y. We set 
P 1 ^ = P™\ = 0. Consider P« = 1 - P« + - P^_ , P™ + = cmin{l, £g±±>} and 

Ptf- = cminjl, P< ^~^ } ; where p is given in Algorithm 1 and c is a sample param- 
eter. Suppose Mi < F < M 2 . 
If the move of type H is selected, then 

1. select k randomly from {0, 1, 2, . . . , n) so that there is 1/3 probability of 
choosing or n and 1/3 (n — 1) probability of choosing any one of the rest; 
generate V ~ Uniform(ck—i, c/c+i), with c_i = M\, Cn+i = M 2 , and Ck = tifc 
for fc = 1, . . . , n; 

2. let j/M be the vector x^ with afe replaced by V; 

3. set the next state 



x 



(t+i) = J 2/ (t \ with prob. p = min{l, ^T^yy } 



If the move of type i? + is selected, then 
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1. generate V ~ Uniform(a , a n ) and assume a k < V < a k +i; 

2. let = (n+l,a ,ai,...,a k ,V,a k+ i,...,a n ) G B (n+1) ; 

3. set the next state 

_(t+i) _ J J/ w , with prob. p = ram i, , , }, 
£ < i/(a;w) x p[n + 1) x (n) 

[x^, o.w. 

If the move of type H~ is selected, then 

1. select k uniformly from {1, 2, . . . , n — 1}; 

2. let 2/M = (n- l,oo,ai, . . . ,afc_i,a fc+ i, . . . , a„) 6 

3. set the next state 

_(t+i) _ J 2/ w , with prob.p = min i, - , , — — -}, 

x — \ i/(a;w) X p(re - 1) X (a„ - a Q ) 

I a;(*), o.w. 



4. Numerical studies 



We now explore the numerical performance of the Bayes methods in this paper. 
Viewing the posterior as a distribution on regression functions, we can use the 
posterior mean F of the regression functions as the estimate; namely, 

1 m 
m * — ' 

3=1 

where to is a large number, and £>W, b^ 2 \ . . . are chosen randomly according to the 
posterior distribution. The performance of F is evaluated by the Li-norm, sup-norm 
and mean square error (MSE) of F — F, as a function on [0,1]. Here F denotes 
the true regression function. We note that, in the studies of isotonic regression 
and concave (convex) regression, F is a reasonable estimate because monotonicity 
and concavity (convexity) are preserved by linear combination with non-negative 
coefficients, and when studying other shape restricted regression, it might be more 
appropriate to use posterior mode as the estimate. 

Numerical studies in this section include comparison between our method and 
the density-regression method. In order to make the comparison more convincing, 
we study in this section both isotonic regression and convex regression, instead of 
concave regression, because they are studied by Dette and coauthors. 

In this section, r = 1; K = 100; Xi,X 2 , ■ ■ . , ^ioo are i.i.d. from Uniform(0, 1); 
TOfc = 1 for every k = 1,2, . . . , 100; g k is Normal(0, a 2 ) with a = 0.1 or 1 in the 
data generation. When carrying out inference, a is estimated from the data. 



4-1. Isotonic regression 

Our simulation studies suggest that compared with the density-regression method 
of Dette et al. .11], our method performs comparably when the noise is large and 
better when the noise is small. We studied all the regression functions in Dette et 
al. [ll|], Dette and Pilz [13], and Dette [13], and all the results are similar; hence, 
we only report the results for two of the regression functions, which are defined by 



F <1> (i)=sin(5t), 



(4.1) 
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F<2>{t) 



2t for t € [0,0.25], 
0.5 for t e [0.25,0.75], 
2t - 1 forte [0.75, 1]. 



(4.2) 



We note that these are not polynomials and our method usually performs bet- 
ter for polynomials. We also note that the notation n in (4.1) is the ratio of the 
circumference of a circle to its diameter. 

We use Algorithm 1 with no — 20, a — 10, q\ ~ Uniform(qn, 512), Q2 ~ 
Uniform(q2i, 922) to generate the prior distribution. Here qi%, qi2, 921 and (722 
arc defined as follows. Let Xn-j < X( 2 ) < • ■ • < -XVioo) De the order statistics 



for {Xi,X 2 ,...,Xioo}. Let = Yj if X {i) = Xj. Then q n = min{Yjj] | i = 
1,2,..., 10}, q 12 =q 21 = ^ J2l=i Yi ^ 922 = max/Y^ | i = 91, 92, . . . , 100}. We 



note that the choice of n$ — 20 and a = 10 makes it relatively uninformative in the 
order of the polynomial. 

We use Algorithm 5 (MHRA) with c = 0.35, Mi — qn, M2 — 922 and estimated 
a 2 to generate the posterior distribution. We note that this choice of c allows rela- 
tively large probabilities of changing the order of the polynomial and a 2 is estimated 
by T5§ Si=i(^[i+i] ~ ^[j]) 2 ' wmcn is also used in the following density-regression 
method. We run MHRA for 100,000 updates; after a burn in period of 10,000 up- 
dates, we use the remaining 90,000 realizations of the Markov chain to obtain the 
posterior mean. Starting from generating the data {Xi, Yi\i — 1,2,..., 100}, the 
above experiment is replicated 200 times; the averages of the resulting 200 Li- 
norms, sup-norms and MSE of F — F are reported in Table 1 and Table 2, which 
also contain the corresponding results using the density-regression method; the 
method of this paper is referred to as the Bayes method. Also contained in Table 1 
and Table 2 are the posterior distributions of the polynomial order; these posterior 
distributions seem to suggest that n = 20 in the prior is large enough. Figure 1 
contains the autocorrelation plots of the Li-norm of F b (j) for one of the 200 repli- 
cates in the study of L<i>. Figure 1 indicates that MHRA behaves quite nicely. 
Those for L , <2 > look similar and hence are omitted. The corresponding results us- 
ing IMA are also omitted, because they are similar to those using MHRA, except 
having larger autocorrelation values and smaller effective sample sizes. Concepts of 
autocorrelation and effective sample size can be found in Liu [3], for example. 

4-2. Convex regression 

The main conclusion of our simulation studies regarding convex regression is sim- 
ilar to that reported for isotonic regression; namely, compared with the density- 
regression method for convex regression studied by Birke and Dette 0] , our Bayes 
method performs comparably when the noise is large and performs better when the 
noise is small. To make the presentation concise, we only report the results for two 
of the regression functions, which are defined by 



^< 3 >W = (16/9)(t-l/4) 2 , 



(4.3) 




(4.4) 



We note that both F <3> and L 1 <4> are examples in Birke and Dette 
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Table 1 

Simulation study for F < i > (t) = sin(|t). 



a 





\F-F\ 


Bayes 


Density 








-regression 




Ll-norm 


0.0161 


0.0238 


0.1 


Sup-norm 


0.0532 


0.0874 




MSE 


0.0004 


0.0010 




Ll-norm 


0.1148 


0.1194 


1 


Sup-norm 


0.2795 


0.2651 




MSE 


0.0215 


0.0233 



b 



0~ 


n 


Posterior 


n 


Posterior 


n 


Posterior 


n 


Posterior 






probability 




probability 




probability 




probability 




1 


0.0000 


6 


0.0797 


11 


0.0960 


16 


0.0139 




2 


0.0300 


7 


0.0994 


12 


0.0708 


17 


0.0076 


0.1 


3 


0.0433 


8 


0.1009 


13 


0.0497 


18 


0.0037 




4 


0.0483 


9 


0.1109 


14 


0.0359 


19 


0.0016 




5 


0.0789 


10 


0.1044 


15 


0.0235 


20 


0.0015 




1 


0.0006 


6 


0.0673 


11 


0.1104 


16 


0.0200 




2 


0.0026 


7 


0.0950 


12 


0.0878 


17 


0.0113 


1 


3 


0.0077 


8 


0.1164 


13 


0.0687 


18 


0.0068 




4 


0.0196 


9 


0.1308 


14 


0.0495 


19 


0.0042 




5 


0.0406 


10 


0.1252 


15 


0.0319 


20 


0.0036 



We use Algorithm 3 with no — 20, a = 10, q ~ Uniform(qoi,qo2), Pi = <73 and 
P2 — Qi to generate the prior distribution, where qoi, qo 2l <?3 and q^ are defined as 
follows. Let im < Yt 2 ) < ■ • ■ < Y(wo) be the order statistics for {Yi, Y 2 , . . . , Yioo}- 
Then q 01 = i^Y^, q Q2 = \q 01 + ^^1=1^1/^ Let X (1) < X {2) < ••• < 
A(ioo) be the order statistics for {X lt X 2 , . . . , X wo }. Let Yui = Yj if X^ = Xj. 
Then q^ = maxlYj^ | i = 1, 2, . . . , 5} and q^ = max{Yr i i | i = 96, 97, . . . , 100}. 

We use MHRA to generate the posterior distribution with similar parameters 
given in Algorithm 5. Table 3 and Table 4 contain the main results. Entries in 
Table 3 and Table 4 bear the same meanings of the corresponding entries in Table 
1. Figure 2 carries similar information as Figure 1. 

Table la gives comparison between the Bayes method and the density-regression 
method. Table lb gives the posterior probability of the order of the Bernstein 
polynomial. The effective sample size for a = 0.1 (1) is 5623 (461). The acceptance 
rate for a = 0.1 (1) is 0.2923 (0.6635). 

Table 2a gives comparison between the Bayes method and the density-regression 
method. Table 2b gives the posterior probability of the order of the Bernstein 
polynomial. The effective sample size for a = 0.1 (1) is 4913 (364). The acceptance 
rate for a = 0.1 (1) is 0.3779 (0.6943). 

Table 3a gives comparison between the Bayes method and the density-regression 
method. Table 3b gives the posterior probability of the order of the Bernstein 
polynomial. The effective sample size for a = 0.1 (1) is 1397 (344). The acceptance 
rate for a = 0.1 (1) is 0.2413 (0.3430). 

Table 4a gives comparison between the Bayes method and the density-regression 
method. Table 4b gives the posterior probability of the order of the Bernstein 
polynomial. The effective sample size for a = 0.1 (1) is 722 (216). The acceptance 
rate for a = 0.1 (1) is 0.2217 (0.3670). 
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Table 2 

C 2t for t e [0,0.25], 

Simulation study for F <2> {t) = < 0.5 for t G [0.25,0.75], 

[ 2t - 1 for t 6 [0.75, 1]. 



a 





\F-F\ 


Bayes 


Density 








-regression 




Ll-norm 


0.0353 


0.0413 


0.1 


Sup-norm 


0.1002 


0.1415 




MSE 


0.0019 


0.0030 




Li-norm 


0.1255 


0.1267 


1 


Sup-norm 


0.3035 


0.3509 




MSE 


0.0244 


0.0256 



b 



a 


n 


Posterior 
probability 


n 


Posterior 
probability 


n 


Posterior 
probability 


n 


Posterior 
probability 




1 


0.0000 


6 


0.0144 


11 


0.1481 


16 


0.0375 




2 


0.0000 


7 


0.0385 


12 


0.1730 


17 


0.0194 


0.1 


3 


0.0000 


8 


0.0420 


13 


0.1571 


18 


0.0117 




4 


0.0000 


9 


0.0621 


14 


0.1142 


19 


0.0062 




5 


0.0005 


10 


0.1006 


15 


0.0692 


20 


0.0055 




1 


0.0004 


6 


0.0611 


11 


0.1169 


16 


0.0195 




2 


0.0016 


7 


0.0871 


12 


0.0992 


17 


0.0113 


1 


3 


0.0062 


8 


0.1100 


13 


0.0794 


18 


0.0060 




4 


0.0177 


9 


0.1245 


14 


0.0539 


19 


0.0029 




5 


0.0379 


10 


0.1284 


15 


0.0335 


20 


0.0025 



Table 3 

Simulation study for F <3> (t) = (16/9)(t - 1/4) 2 . 



a 





\F-F\ 


Bayes 


Density 








-regression 




Ll-norm 


0.0157 


0.0775 


0.1 


Sup-norm 


0.0525 


0.2296 




MSE 


0.0004 


0.0101 




Ll-norm 


0.1362 


0.1366 


1 


Sup-norm 


0.4247 


0.3643 




MSE 


0.0319 


0.0292 



b 



o 


n 


Posterior 


n 


Posterior 


n 


Posterior 


n 


Posterior 






probability 




probability 




probability 




probability 




2 


0.0001 


7 


0.1177 


12 


0.0452 


17 


0.0064 




3 


0.0609 


8 


0.1312 


13 


0.0346 


18 


0.0029 


0.1 


4 


0.0826 


9 


0.1139 


14 


0.0230 


19 


0.0020 




5 


0.0931 


10 


0.0938 


15 


0.0147 


20 


0.0021 




6 


0.1015 


11 


0.0653 


16 


0.0090 








2 


0.0413 


7 


0.1094 


12 


0.0516 


17 


0.0062 




3 


0.0602 


8 


0.1175 


13 


0.0313 


18 


0.0028 


1 


4 


0.0728 


9 


0.1110 


14 


0.0204 


19 


0.0015 




5 


0.0849 


10 


0.0886 


15 


0.0141 


20 


0.0014 




6 


0.1037 


11 


0.0717 


16 


0.0096 
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Table 4 

{-At + 1 for t 6 [0,0.25], 

for te [0.25,0.75], 

4t-3 for t 6 [0.75, 1] . 



a 





\F-F\ 


Bayes 


Density 








-regression 




Ll-norm 


0.0405 


0.1311 


0.1 


Sup-norm 


0.1381 


0.4663 




MSE 


0.0026 


0.0282 




Ll-norm 


0.1394 


0.2265 


1 


Sup-norm 


0.4603 


0.7010 




MSE 


0.0338 


0.0793 



b 



a 


n 


Posterior 


n 


Posterior 


n 


Posterior 


n 


Posterior 






probability 




probability 




probability 




probability 




2 


0.0000 


7 


0.0000 


12 


0.1710 


17 


0.0263 




3 


0.0000 


8 


0.0205 


13 


0.1301 


18 


0.0158 


0.1 


4 


0.0000 


9 


0.1144 


14 


0.1130 


19 


0.0124 




5 


0.0000 


10 


0.1028 


15 


0.0778 


20 


0.0187 




6 


0.0000 


11 


0.1498 


16 


0.0474 








2 


0.0153 


7 


0.1194 


12 


0.0571 


17 


0.0068 




3 


0.0295 


8 


0.1254 


13 


0.0432 


18 


0.0041 


1 


4 


0.0539 


9 


0.1153 


14 


0.0321 


19 


0.0013 




5 


0.0784 


10 


0.0953 


15 


0.0227 


20 


0.0014 




6 


0.1065 


11 


0.0786 


16 


0.0137 







0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 


-0.1 



x 10 



0.9 
0.8 
0.7 
0.6 
0.5 
0.4 
0.3 
0.2 
0.1 


-0.1 




x 10 



Figure la: a =0.1 



Figure lb: = 1 



Fig 1 . Autocorrelation plots for the MHRA in the data generation from the posterior distribution 
for F K1> (t) = sin(§t). 
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02468 02468 
La S x10 4 Lag x10 4 

Figure 2a: a = 0. 1 Figure 2b : a = 1 

Fig 2. Autocorrelation plots for the MHRA in the data generation from the posterior distribution 
forF <3> (t) = (16/9)(t- 1/4) 2 . 



5. Discussion 

We have proposed a Bayes approach to shape restricted regression with the prior 
introduced by random Bernstein polynomials. In particular, algorithms for gener- 
ating the priors and the posteriors are proposed and numerical performance of the 
Bayes estimate has been examined in some detail. The usefulness of this approach 
is successfully demonstrated in simulation studies for isotonic regression and con- 
vex regression, which compares our method with the density-regression method. 
We note that certain frequentist properties of this Bayes estimate are established 
in Chang et al. 0. The method of this paper can also be used to assess the validity 
of the shape restriction on the regression function, by considering the predictive 
posterior assessment proposed by Gelman et al. [14[ and Gelman [13(; in fact, we 
have implemented this assessment and found it quite satisfactory. We would like to 
remark that this approach can be used to study other shape restricted statistical 
inference problem. In fact, as pointed out by P. Bickel and M. Woodroofe in the 
Vardi memorial conference, the geometry of Bernstein polynomials may be utilized 
to propose a penalized likelihood approach to shape restricted regression; our pre- 
liminary simulation studies (not reported here) do suggest that penalized maximum 
likelihood estimate looks promising, and further investigation is underway. 
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